{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a href=\"https://colab.research.google.com/github/jerryjliu/llama_index/blob/main/docs/examples/embeddings/together.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Chunk + Document Hybrid Retrieval with Long-Context Embeddings (Together.ai) \n",
    "\n",
    "This notebook shows how to use long-context together.ai embedding models for advanced RAG. We index each document by running the embedding model over the entire document text, as well as embedding each chunk. We then define a custom retriever that can compute both node similarity as well as document similarity.\n",
    "\n",
    "Visit https://together.ai and sign up to get an API key."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup and Download Data\n",
    "\n",
    "We load in our documentation. For the sake of speed we load in just 10 pages, but of course if you want to stress test your model you should load in all of it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install llama-index-embeddings-together\n",
    "%pip install llama-index-llms-openai\n",
    "%pip install llama-index-embeddings-openai\n",
    "%pip install llama-index-readers-file"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "domain = \"docs.llamaindex.ai\"\n",
    "docs_url = \"https://docs.llamaindex.ai/en/latest/\"\n",
    "!wget -e robots=off --recursive --no-clobber --page-requisites --html-extension --convert-links --restrict-file-names=windows --domains {domain} --no-parent {docs_url}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.readers.file import UnstructuredReader\n",
    "from pathlib import Path\n",
    "from llama_index.llms.openai import OpenAI\n",
    "from llama_index.core import Document"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "[nltk_data] Downloading package punkt to /Users/jerryliu/nltk_data...\n",
      "[nltk_data]   Package punkt is already up-to-date!\n",
      "[nltk_data] Downloading package averaged_perceptron_tagger to\n",
      "[nltk_data]     /Users/jerryliu/nltk_data...\n",
      "[nltk_data]   Package averaged_perceptron_tagger is already up-to-\n",
      "[nltk_data]       date!\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Idx 0/8\n",
      "docs.llamaindex.ai/en/latest/index.html\n",
      "Idx 1/8\n",
      "docs.llamaindex.ai/en/latest/contributing/contributing.html\n",
      "Idx 2/8\n",
      "docs.llamaindex.ai/en/latest/understanding/understanding.html\n",
      "Idx 3/8\n",
      "docs.llamaindex.ai/en/latest/understanding/using_llms/using_llms.html\n",
      "Idx 4/8\n",
      "docs.llamaindex.ai/en/latest/understanding/using_llms/privacy.html\n",
      "Idx 5/8\n",
      "docs.llamaindex.ai/en/latest/understanding/loading/llamahub.html\n",
      "Idx 6/8\n",
      "docs.llamaindex.ai/en/latest/optimizing/production_rag.html\n",
      "Idx 7/8\n",
      "docs.llamaindex.ai/en/latest/module_guides/models/llms.html\n"
     ]
    }
   ],
   "source": [
    "reader = UnstructuredReader()\n",
    "# all_files_gen = Path(\"./docs.llamaindex.ai/\").rglob(\"*\")\n",
    "# all_files = [f.resolve() for f in all_files_gen]\n",
    "# all_html_files = [f for f in all_files if f.suffix.lower() == \".html\"]\n",
    "\n",
    "# curate a subset\n",
    "all_html_files = [\n",
    "    \"docs.llamaindex.ai/en/latest/index.html\",\n",
    "    \"docs.llamaindex.ai/en/latest/contributing/contributing.html\",\n",
    "    \"docs.llamaindex.ai/en/latest/understanding/understanding.html\",\n",
    "    \"docs.llamaindex.ai/en/latest/understanding/using_llms/using_llms.html\",\n",
    "    \"docs.llamaindex.ai/en/latest/understanding/using_llms/privacy.html\",\n",
    "    \"docs.llamaindex.ai/en/latest/understanding/loading/llamahub.html\",\n",
    "    \"docs.llamaindex.ai/en/latest/optimizing/production_rag.html\",\n",
    "    \"docs.llamaindex.ai/en/latest/module_guides/models/llms.html\",\n",
    "]\n",
    "\n",
    "\n",
    "# TODO: set to higher value if you want more docs\n",
    "doc_limit = 10\n",
    "\n",
    "docs = []\n",
    "for idx, f in enumerate(all_html_files):\n",
    "    if idx > doc_limit:\n",
    "        break\n",
    "    print(f\"Idx {idx}/{len(all_html_files)}\")\n",
    "    loaded_docs = reader.load_data(file=f, split_documents=True)\n",
    "    # Hardcoded Index. Everything before this is ToC for all pages\n",
    "    # Adjust this start_idx to suit your needs\n",
    "    start_idx = 64\n",
    "    loaded_doc = Document(\n",
    "        id_=str(f),\n",
    "        text=\"\\n\\n\".join([d.get_content() for d in loaded_docs[start_idx:]]),\n",
    "        metadata={\"path\": str(f)},\n",
    "    )\n",
    "    print(str(f))\n",
    "    docs.append(loaded_doc)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Building Hybrid Retrieval with Chunk Embedding + Parent Embedding\n",
    "\n",
    "Define a custom retriever that does the following:\n",
    "- First retrieve relevant chunks based on embedding similarity\n",
    "- For each chunk, lookup the source document embedding.\n",
    "- Weight it by an alpha.\n",
    "\n",
    "This is essentially vector retrieval with a reranking step that reweights the node similarities."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# You can set the API key in the embeddings or env\n",
    "# import os\n",
    "# os.environ[\"TOEGETHER_API_KEY\"] = \"your-api-key\"\n",
    "\n",
    "from llama_index.embeddings.together import TogetherEmbedding\n",
    "from llama_index.embeddings.openai import OpenAIEmbedding\n",
    "from llama_index.llms.openai import OpenAI\n",
    "\n",
    "api_key = \"<api_key>\"\n",
    "\n",
    "embed_model = TogetherEmbedding(\n",
    "    model_name=\"togethercomputer/m2-bert-80M-32k-retrieval\", api_key=api_key\n",
    ")\n",
    "\n",
    "llm = OpenAI(temperature=0, model=\"gpt-3.5-turbo\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create Document Store \n",
    "\n",
    "Create docstore for original documents. Embed each document, and put in docstore.\n",
    "\n",
    "We will refer to this later in our hybrid retrieval algorithm! "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core.storage.docstore import SimpleDocumentStore\n",
    "\n",
    "for doc in docs:\n",
    "    embedding = embed_model.get_text_embedding(doc.get_content())\n",
    "    doc.embedding = embedding\n",
    "\n",
    "docstore = SimpleDocumentStore()\n",
    "docstore.add_documents(docs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Build Vector Index\n",
    "\n",
    "Let's build the vector index of chunks. Each chunk will also have a reference to its source document through its `index_id` (which can then be used to lookup the source document in the docstore)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core.schema import IndexNode\n",
    "from llama_index.core import (\n",
    "    load_index_from_storage,\n",
    "    StorageContext,\n",
    "    VectorStoreIndex,\n",
    ")\n",
    "from llama_index.core.node_parser import SentenceSplitter\n",
    "from llama_index.core import SummaryIndex\n",
    "from llama_index.core.retrievers import RecursiveRetriever\n",
    "import os\n",
    "from tqdm.notebook import tqdm\n",
    "import pickle\n",
    "\n",
    "\n",
    "def build_index(docs, out_path: str = \"storage/chunk_index\"):\n",
    "    nodes = []\n",
    "\n",
    "    splitter = SentenceSplitter(chunk_size=512, chunk_overlap=70)\n",
    "    for idx, doc in enumerate(tqdm(docs)):\n",
    "        # print('Splitting: ' + str(idx))\n",
    "\n",
    "        cur_nodes = splitter.get_nodes_from_documents([doc])\n",
    "        for cur_node in cur_nodes:\n",
    "            # ID will be base + parent\n",
    "            file_path = doc.metadata[\"path\"]\n",
    "            new_node = IndexNode(\n",
    "                text=cur_node.text or \"None\",\n",
    "                index_id=str(file_path),\n",
    "                metadata=doc.metadata\n",
    "                # obj=doc\n",
    "            )\n",
    "            nodes.append(new_node)\n",
    "    print(\"num nodes: \" + str(len(nodes)))\n",
    "\n",
    "    # save index to disk\n",
    "    if not os.path.exists(out_path):\n",
    "        index = VectorStoreIndex(nodes, embed_model=embed_model)\n",
    "        index.set_index_id(\"simple_index\")\n",
    "        index.storage_context.persist(f\"./{out_path}\")\n",
    "    else:\n",
    "        # rebuild storage context\n",
    "        storage_context = StorageContext.from_defaults(\n",
    "            persist_dir=f\"./{out_path}\"\n",
    "        )\n",
    "        # load index\n",
    "        index = load_index_from_storage(\n",
    "            storage_context, index_id=\"simple_index\", embed_model=embed_model\n",
    "        )\n",
    "\n",
    "    return index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "index = build_index(docs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Define Hybrid Retriever\n",
    "\n",
    "We define a hybrid retriever that can first fetch chunks by vector similarity, and then reweight it based on similarity with the parent document (using an alpha parameter)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core.retrievers import BaseRetriever\n",
    "from llama_index.core.indices.query.embedding_utils import get_top_k_embeddings\n",
    "from llama_index.core import QueryBundle\n",
    "from llama_index.core.schema import NodeWithScore\n",
    "from typing import List, Any, Optional\n",
    "\n",
    "\n",
    "class HybridRetriever(BaseRetriever):\n",
    "    \"\"\"Hybrid retriever.\"\"\"\n",
    "\n",
    "    def __init__(\n",
    "        self,\n",
    "        vector_index,\n",
    "        docstore,\n",
    "        similarity_top_k: int = 2,\n",
    "        out_top_k: Optional[int] = None,\n",
    "        alpha: float = 0.5,\n",
    "        **kwargs: Any,\n",
    "    ) -> None:\n",
    "        \"\"\"Init params.\"\"\"\n",
    "        super().__init__(**kwargs)\n",
    "        self._vector_index = vector_index\n",
    "        self._embed_model = vector_index._embed_model\n",
    "        self._retriever = vector_index.as_retriever(\n",
    "            similarity_top_k=similarity_top_k\n",
    "        )\n",
    "        self._out_top_k = out_top_k or similarity_top_k\n",
    "        self._docstore = docstore\n",
    "        self._alpha = alpha\n",
    "\n",
    "    def _retrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]:\n",
    "        \"\"\"Retrieve nodes given query.\"\"\"\n",
    "\n",
    "        # first retrieve chunks\n",
    "        nodes = self._retriever.retrieve(query_bundle.query_str)\n",
    "\n",
    "        # get documents, and embedding similiaryt between query and documents\n",
    "\n",
    "        ## get doc embeddings\n",
    "        docs = [self._docstore.get_document(n.node.index_id) for n in nodes]\n",
    "        doc_embeddings = [d.embedding for d in docs]\n",
    "        query_embedding = self._embed_model.get_query_embedding(\n",
    "            query_bundle.query_str\n",
    "        )\n",
    "\n",
    "        ## compute doc similarities\n",
    "        doc_similarities, doc_idxs = get_top_k_embeddings(\n",
    "            query_embedding, doc_embeddings\n",
    "        )\n",
    "\n",
    "        ## compute final similarity with doc similarities and original node similarity\n",
    "        result_tups = []\n",
    "        for doc_idx, doc_similarity in zip(doc_idxs, doc_similarities):\n",
    "            node = nodes[doc_idx]\n",
    "            # weight alpha * node similarity + (1-alpha) * doc similarity\n",
    "            full_similarity = (self._alpha * node.score) + (\n",
    "                (1 - self._alpha) * doc_similarity\n",
    "            )\n",
    "            print(\n",
    "                f\"Doc {doc_idx} (node score, doc similarity, full similarity): {(node.score, doc_similarity, full_similarity)}\"\n",
    "            )\n",
    "            result_tups.append((full_similarity, node))\n",
    "\n",
    "        result_tups = sorted(result_tups, key=lambda x: x[0], reverse=True)\n",
    "        # update scores\n",
    "        for full_score, node in result_tups:\n",
    "            node.score = full_score\n",
    "\n",
    "        return [n for _, n in result_tups][:out_top_k]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "top_k = 10\n",
    "out_top_k = 3\n",
    "hybrid_retriever = HybridRetriever(\n",
    "    index, docstore, similarity_top_k=top_k, out_top_k=3, alpha=0.5\n",
    ")\n",
    "base_retriever = index.as_retriever(similarity_top_k=out_top_k)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def show_nodes(nodes, out_len: int = 200):\n",
    "    for idx, n in enumerate(nodes):\n",
    "        print(f\"\\n\\n >>>>>>>>>>>> ID {n.id_}: {n.metadata['path']}\")\n",
    "        print(n.get_content()[:out_len])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "query_str = \"Tell me more about the LLM interface and where they're used\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Doc 0 (node score, doc similarity, full similarity): (0.8951729860296237, 0.888711859390314, 0.8919424227099688)\n",
      "Doc 3 (node score, doc similarity, full similarity): (0.7606735418349336, 0.888711859390314, 0.8246927006126239)\n",
      "Doc 1 (node score, doc similarity, full similarity): (0.8008658562229534, 0.888711859390314, 0.8447888578066337)\n",
      "Doc 4 (node score, doc similarity, full similarity): (0.7083936595542725, 0.888711859390314, 0.7985527594722932)\n",
      "Doc 2 (node score, doc similarity, full similarity): (0.7627518988051541, 0.7151744680533735, 0.7389631834292638)\n",
      "Doc 5 (node score, doc similarity, full similarity): (0.6576277615091234, 0.6506473659825045, 0.654137563745814)\n",
      "Doc 7 (node score, doc similarity, full similarity): (0.6141130778320664, 0.6159139530209246, 0.6150135154264955)\n",
      "Doc 6 (node score, doc similarity, full similarity): (0.6225339833394525, 0.24827341793941335, 0.43540370063943296)\n",
      "Doc 8 (node score, doc similarity, full similarity): (0.5672766061523489, 0.24827341793941335, 0.4077750120458811)\n",
      "Doc 9 (node score, doc similarity, full similarity): (0.5671131641337652, 0.24827341793941335, 0.4076932910365893)\n"
     ]
    }
   ],
   "source": [
    "nodes = hybrid_retriever.retrieve(query_str)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      " >>>>>>>>>>>> ID 2c7b42d3-520c-4510-ba34-d2f2dfd5d8f5: docs.llamaindex.ai/en/latest/module_guides/models/llms.html\n",
      "Contributing: Anyone is welcome to contribute new LLMs to the documentation. Simply copy an existing notebook, setup and test your LLM, and open a PR with your results.\n",
      "\n",
      "If you have ways to improve th\n",
      "\n",
      "\n",
      " >>>>>>>>>>>> ID 72cc9101-5b36-4821-bd50-e707dac8dca1: docs.llamaindex.ai/en/latest/module_guides/models/llms.html\n",
      "Using LLMs\n",
      "\n",
      "Concept\n",
      "\n",
      "Picking the proper Large Language Model (LLM) is one of the first steps you need to consider when building any LLM application over your data.\n",
      "\n",
      "LLMs are a core component of Llam\n",
      "\n",
      "\n",
      " >>>>>>>>>>>> ID 7c2be7c7-44aa-4f11-b670-e402e5ac35a5: docs.llamaindex.ai/en/latest/module_guides/models/llms.html\n",
      "If you change the LLM, you may need to update this tokenizer to ensure accurate token counts, chunking, and prompting.\n",
      "\n",
      "The single requirement for a tokenizer is that it is a callable function, that t\n"
     ]
    }
   ],
   "source": [
    "show_nodes(nodes)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "base_nodes = base_retriever.retrieve(query_str)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      " >>>>>>>>>>>> ID 2c7b42d3-520c-4510-ba34-d2f2dfd5d8f5: docs.llamaindex.ai/en/latest/module_guides/models/llms.html\n",
      "Contributing: Anyone is welcome to contribute new LLMs to the documentation. Simply copy an existing notebook, setup and test your LLM, and open a PR with your results.\n",
      "\n",
      "If you have ways to improve th\n",
      "\n",
      "\n",
      " >>>>>>>>>>>> ID 72cc9101-5b36-4821-bd50-e707dac8dca1: docs.llamaindex.ai/en/latest/module_guides/models/llms.html\n",
      "Using LLMs\n",
      "\n",
      "Concept\n",
      "\n",
      "Picking the proper Large Language Model (LLM) is one of the first steps you need to consider when building any LLM application over your data.\n",
      "\n",
      "LLMs are a core component of Llam\n",
      "\n",
      "\n",
      " >>>>>>>>>>>> ID 252fc99b-2817-4913-bcbf-4dd8ef509b8c: docs.llamaindex.ai/en/latest/index.html\n",
      "These could be APIs, PDFs, SQL, and (much) more.\n",
      "\n",
      "Data indexes structure your data in intermediate representations that are easy and performant for LLMs to consume.\n",
      "\n",
      "Engines provide natural language a\n"
     ]
    }
   ],
   "source": [
    "show_nodes(base_nodes)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Run Some Queries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core.query_engine import RetrieverQueryEngine\n",
    "\n",
    "query_engine = RetrieverQueryEngine(hybrid_retriever)\n",
    "base_query_engine = index.as_query_engine(similarity_top_k=out_top_k)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Doc 0 (node score, doc similarity, full similarity): (0.8951729860296237, 0.888711859390314, 0.8919424227099688)\n",
      "Doc 3 (node score, doc similarity, full similarity): (0.7606735418349336, 0.888711859390314, 0.8246927006126239)\n",
      "Doc 1 (node score, doc similarity, full similarity): (0.8008658562229534, 0.888711859390314, 0.8447888578066337)\n",
      "Doc 4 (node score, doc similarity, full similarity): (0.7083936595542725, 0.888711859390314, 0.7985527594722932)\n",
      "Doc 2 (node score, doc similarity, full similarity): (0.7627518988051541, 0.7151744680533735, 0.7389631834292638)\n",
      "Doc 5 (node score, doc similarity, full similarity): (0.6576277615091234, 0.6506473659825045, 0.654137563745814)\n",
      "Doc 7 (node score, doc similarity, full similarity): (0.6141130778320664, 0.6159139530209246, 0.6150135154264955)\n",
      "Doc 6 (node score, doc similarity, full similarity): (0.6225339833394525, 0.24827341793941335, 0.43540370063943296)\n",
      "Doc 8 (node score, doc similarity, full similarity): (0.5672766061523489, 0.24827341793941335, 0.4077750120458811)\n",
      "Doc 9 (node score, doc similarity, full similarity): (0.5671131641337652, 0.24827341793941335, 0.4076932910365893)\n",
      "The LLM interface is a unified interface provided by LlamaIndex for defining Large Language Models (LLMs) from different sources such as OpenAI, Hugging Face, or LangChain. This interface eliminates the need to write the boilerplate code for defining the LLM interface yourself. The LLM interface supports text completion and chat endpoints, as well as streaming and non-streaming endpoints. It also supports both synchronous and asynchronous endpoints.\n",
      "\n",
      "LLMs are a core component of LlamaIndex and can be used as standalone modules or plugged into other core LlamaIndex modules such as indices, retrievers, and query engines. They are primarily used during the response synthesis step, which occurs after retrieval. Depending on the type of index being used, LLMs may also be used during index construction, insertion, and query traversal.\n",
      "\n",
      "To use LLMs, you can import the necessary modules and instantiate the LLM object. You can then use the LLM object to generate responses or complete text prompts. LlamaIndex provides examples and code snippets to help you get started with using LLMs.\n",
      "\n",
      "It's important to note that tokenization plays a crucial role in LLMs. LlamaIndex uses a global tokenizer by default, but if you change the LLM, you may need to update the tokenizer to ensure accurate token counts, chunking, and prompting. LlamaIndex provides instructions on how to set a global tokenizer using libraries like tiktoken or Hugging Face's AutoTokenizer.\n",
      "\n",
      "Overall, LLMs are powerful tools for building LlamaIndex applications and can be customized within the LlamaIndex abstractions. While LLMs from paid APIs like OpenAI and Anthropic are generally considered more reliable, local open-source models are gaining popularity due to their customizability and transparency. LlamaIndex offers integrations with various LLMs and provides documentation on their compatibility and performance. Contributions to improve the setup and performance of existing LLMs or to add new LLMs are welcome.\n"
     ]
    }
   ],
   "source": [
    "response = query_engine.query(query_str)\n",
    "print(str(response))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The LLM interface is a unified interface provided by LlamaIndex for defining Large Language Model (LLM) modules. It allows users to easily integrate LLMs from different providers such as OpenAI, Hugging Face, or LangChain into their applications without having to write the boilerplate code for defining the LLM interface themselves.\n",
      "\n",
      "LLMs are a core component of LlamaIndex and can be used as standalone modules or plugged into other core LlamaIndex modules such as indices, retrievers, and query engines. They are primarily used during the response synthesis step, which occurs after retrieval. Depending on the type of index being used, LLMs may also be used during index construction, insertion, and query traversal.\n",
      "\n",
      "The LLM interface supports various functionalities, including text completion and chat endpoints. It also provides support for streaming and non-streaming endpoints, as well as synchronous and asynchronous endpoints.\n",
      "\n",
      "To use LLMs, you can import the necessary modules and make use of the provided functions. For example, you can use the OpenAI module to interact with the gpt-3.5-turbo LLM by calling the `OpenAI()` function. You can then use the `complete()` function to generate completions based on a given prompt.\n",
      "\n",
      "It's important to note that LlamaIndex uses a global tokenizer called cl100k from tiktoken by default for all token counting. If you change the LLM being used, you may need to update the tokenizer to ensure accurate token counts, chunking, and prompting.\n",
      "\n",
      "Overall, LLMs and the LLM interface provided by LlamaIndex are essential for building LLM applications and integrating them into the LlamaIndex ecosystem.\n"
     ]
    }
   ],
   "source": [
    "base_response = base_query_engine.query(query_str)\n",
    "print(str(base_response))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "llama_index_v2",
   "language": "python",
   "name": "llama_index_v2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
